1 Introduction

This vignette introduces the SpaceTrooper package for spatial data analysis from platforms like Xenium, Merfish, and CosMx. We cover data loading, quality control, and result visualization.

2 Installation

To install SpaceTrooper, use the following commands:

# Install BiocManager if not already installed, then install SpaceTrooper
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("drighelli/SpaceTrooper")

3 Data Loading

In this section, we load data from various platforms using the package’s functions. The goal is to provide a uniform SpatialExperiment object across all technologies, allowing for consistent QC analysis.

The functions in SpaceTrooper compute missing metrics as needed and allow for the inclusion of polygons with the keep_polygons argument. This stores polygons in the colData of the SpatialExperiment. We suggest to use the keep_polygons argument for technologies like Xenium and Merfish/Merscope because we already load the polygons to compute missing metrics in these cases.

Eventually, I/O functions will be Pull Requested into the SpatialExperimentIO Bioconductor package, to discuss with the authors.

Additionally, we are developing compatibility with the SpatialFeatureExperiment class and Python’s spatialData class for cross-language support.

# Load the SpaceTrooper library
library(SpaceTrooper)

# Load Xenium data into a Spatial Experiment object (SPE)
xeniumFolder <- "~/Downloads/Xenium_data/pancreas/"
spe <- readXeniumSPE(xeniumFolder, compute_missing_metrics=TRUE,
                     keep_polygons=TRUE)

# Load Merfish data into an SPE with parquet boundaries
merscopeFolder <- "~/Downloads/Merfish_data/human_uterine_cancer_patient2/"
spe <- readMerfishSPE(merscopeFolder, boundaries_type="parquet",
                      compute_missing_metrics=TRUE, keep_polygons=TRUE)

# Load CosMx data into an SPE without polygons
cosmxFolder <- "~/Downloads/CosMx_data/DBKero/CosMx_Breast/CosMx_data_Case2/"
spe <- readCosmxSPE(cosmxFolder)

4 Data Analysis based on CosMx

The package offers several functions for spatial data analysis, including quality control and visualization.

This tutorial focuses on CosMx data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx and are not available for other technologies like Xenium and Merfish/Merscope.

For CosMx data, you don’t need to keep_polygons initially because the metrics can be computed without them. Polygons can be loaded later if needed.

library(SpaceTrooper)

# Reload CosMx data with sample name and without polygons
cosmxFolder <- "~/Downloads/CosMx_data/DBKero/CosMx_Breast/CosMx_data_Case2/"
spe <- readCosmxSPE(cosmxFolder, sample_name="DBKero_CosMx",
                    keep_polygons=FALSE)
## a custom function for the workshop
source(system.file("scripts/labelsfunct.R", package="SpaceTrooper"))
spe <- addLabelsDBKero(spe)

spe
## class: SpatialExperiment 
## dim: 1010 59284 
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(1010): RAMP2 CD83 ... NegPrb09 NegPrb10
## rowData names(0):
## colnames(59284): f1_c1 f1_c2 ... f45_c1993 f45_c1994
## colData names(24): fov cellID ... labels labels_colors
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id

5 Field of Views (FOVs) Visualization

The plotCellsFovs function shows a map of the FoVs within an experiment. This plot sis specific to CosMx data and uses cell centroids.

# Plot the cells within their respective Field of Views (FOVs)
plotCellsFovs(spe)

6 Visualizing cell types

To focus on cell centroids, use the plotCentroidsSPE function. It provides a colour_by argument, similar to scater. Additionally, if you have a column with a color palette it automatically maps it to the values passed in the colour_by argument.

# Plot the centroids of the cells in the SPE
plotCentroidsSPE(spe, colour_by="labels")

# Plot the centroids of the cells in the SPE with personalized labels
plotCentroidsSPE(spe, colour_by="labels", palette="labels_colors")

7 Quality control

The spatialPerCellQC function, inspired by scater::addPerCellQC, computes additional metrics for each cell in the SpatialExperiment. It also allows for the detection of negative control probes, which is crucial for QC.

# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("NegPrb", "Negative",
                                           "SystemControl"))

8 Metrics Histograms

You can investigate individual metrics by viewing their histograms. For outliers, use the use_fences argument to display the fences computed by computeSpatialOutlier.

# Plot a histogram of counts (sum)
plotMetricHist(spe, metric="sum")

# Plot a histogram of cell areas (Area_um)
plotMetricHist(spe, metric="Area_um")

# Plot a histogram of proportion of counts respect to the cell area in micron 
plotMetricHist(spe, metric="log2CountArea")

9 Spatial Outlier Detection

Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.

The computeSpatialOutlier function allows the computation of medcouple statistics on a specified metric (compute_by argument). It can also use scuttle::isOutlier for asymmetric distributions. The method argument supports “mc”, “scuttle”, or “both”.

This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.

# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, compute_by="Area_um", method="both")

# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, compute_by="Mean.DAPI", method="both")

If we computed outliers with the computeSpatialOutlier function we can also visualize which fences have been used to create the filter on the cells.

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", use_fences="Area_um_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", use_fences="Area_um_outlier_sc")

# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", use_fences="Mean.DAPI_outlier_mc")

# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", use_fences="Mean.DAPI_outlier_sc")

10 The Flag Score

Next, we use computeQCScore to calculate a flag score based on previously computed metrics. The flag score combines transcript counts related to cell area, the aspect ratio of each cell, and its distance from the FoV border (only for CosMx, this last one is not used otherwise).

# Calculate quality scores for each cell
spe <- computeQCScore(spe)

Logical filters can then be computed using computeFilterFlags, which requires thresholds for various metrics. Currently, the function considers:

  • Flag Score (fs_threshold): Cells with scores below this threshold (default 0.5) are flagged for exclusion. This value can be used to indicate the quantile for the filtering when setting the use_fs_quantiles argument to TRUE.

  • Flag Score Quantiles (use_fs_quantiles): Option to filter based on quantiles (default FALSE).

  • Total Counts (total_threshold): Minimum count threshold (default 0).

  • Negative Probe Ratio (ctrl_tot_ratio_threshold): Minimum ratio of negative probes to total counts (default 0.1).

# Compute flags to identify cells for filtering
spe <- computeFilterFlags(spe, fs_threshold=0.7)

11 Polygon Addition and Visualization

To better understand the flag score values we start to load the polygons, giving us a better overview of the cells characteristics.

We can load and add polygons to your SPE object using the following functions. Each technology has its own readPolygons function to standardize the loaded sf object and handle different file types.

# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)

# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)

Once the polygons are stored in an sf object within colData, you can visualize them using functions based on the tmap library.

# Subset the SPE to include only specified FOVs
spe1112 <- spe[,spe$fov %in% c(11:12)]

# Plot the polygons of the selected cells
plotPolygonsSPE(spe1112, bg_color="white")

# Plot polygons colored by cell area
plotPolygonsSPE(spe1112, colour_by="log2AspectRatio")

plotPolygonsSPE(spe1112, colour_by="Area_um")

plotPolygonsSPE(spe1112, colour_by="flag_score", palette="viridis")

plotPolygonsSPE(spe1112, colour_by="is_fscore_outlier")

12 Fov Zoom and Map

The plotZoomFovsMap function allows you to visualize a map of the FoVs with a zoom-in of selected FoVs, colored by the colour_by argument. You can also toggle between tmap and ggplot2 for visualization (this is a work-in-progress).

## not working in vignette compilation
plotZoomFovsMap(spe, fovs=c(16:19), colour_by="flag_score")
plotZoomFovsMap(spe, fovs=c(16:19), colour_by="is_fscore_outlier", useggplot=FALSE)

13 Conclusion

In this vignette, we explored the main functionalities of the SpaceTrooper package for spatial data analysis. Main steps shown are: * data loading: Xenium, Merfish/Merscope, CosMx * polygons loading: Xenium, Merfish/Merscope, CosMx * quality control: + outlier detection: medcouple and scuttle MAD + flag score: a score combining transcript counts, cell area, aspect ratio and distance from the FoV border * visualization: + centroids: with ggplot2 + polygons: sf + tmap

14 Pending

14.1 Other Features

We released several functions for polygon handling, sf object manipulation, and palette creation. These will be improved and generalized in future versions.

More vignettes to come:

  • Polygons handling

  • sf object accessory functions

  • palette creation from colData (work-in-progress)

14.2 More to come

Interoperability:

  • within R: SpatialFeatureExperiment support

  • outside R: spatialData Python class I/O

Visualization:

  • more plots:

    • interactive plots with polygons

    • metrics violin plot

    • metrics scatter plot

    • multiple metrics polygons visualization

15 Session Information

sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Rome
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpaceTrooper_0.1.0          SpatialExperiment_1.12.0   
##  [3] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
##  [5] Biobase_2.62.0              GenomicRanges_1.54.1       
##  [7] GenomeInfoDb_1.38.1         IRanges_2.36.0             
##  [9] S4Vectors_0.40.1            BiocGenerics_0.48.1        
## [11] MatrixGenerics_1.14.0       matrixStats_1.1.0          
## [13] BiocStyle_2.30.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3        rstudioapi_0.16.0        
##   [3] jsonlite_1.8.7            magrittr_2.0.3           
##   [5] ggbeeswarm_0.7.2          magick_2.8.1             
##   [7] farver_2.1.1              rmarkdown_2.25           
##   [9] zlibbioc_1.48.0           vctrs_0.6.4              
##  [11] DelayedMatrixStats_1.24.0 RCurl_1.98-1.13          
##  [13] terra_1.7-55              base64enc_0.1-3          
##  [15] rstatix_0.7.2             leafsync_0.1.0           
##  [17] htmltools_0.5.7           S4Arrays_1.2.0           
##  [19] BiocNeighbors_1.20.0      broom_1.0.5              
##  [21] raster_3.6-26             Rhdf5lib_1.24.0          
##  [23] SparseArray_1.2.2         rhdf5_2.46.0             
##  [25] sass_0.4.7                KernSmooth_2.23-22       
##  [27] bslib_0.5.1               htmlwidgets_1.6.2        
##  [29] stars_0.6-4               cachem_1.0.8             
##  [31] lifecycle_1.0.4           pkgconfig_2.0.3          
##  [33] rsvd_1.0.5                Matrix_1.6-4             
##  [35] R6_2.5.1                  fastmap_1.1.1            
##  [37] GenomeInfoDbData_1.2.11   digest_0.6.33            
##  [39] colorspace_2.1-0          scater_1.30.0            
##  [41] leafem_0.2.3              dqrng_0.3.1              
##  [43] irlba_2.3.5.1             crosstalk_1.2.0          
##  [45] ggpubr_0.6.0              beachmat_2.18.0          
##  [47] labeling_0.4.3            lwgeom_0.2-13            
##  [49] fansi_1.0.5               abind_1.4-5              
##  [51] compiler_4.3.0            proxy_0.4-27             
##  [53] withr_2.5.2               bit64_4.0.5              
##  [55] backports_1.4.1           BiocParallel_1.36.0      
##  [57] carData_3.0-5             viridis_0.6.4            
##  [59] DBI_1.1.3                 highr_0.10               
##  [61] HDF5Array_1.30.0          R.utils_2.12.2           
##  [63] ggsignif_0.6.4            tmaptools_3.1-1          
##  [65] DelayedArray_0.28.0       leaflet_2.2.1            
##  [67] rjson_0.2.21              classInt_0.4-10          
##  [69] tools_4.3.0               units_0.8-4              
##  [71] vipor_0.4.5               beeswarm_0.4.0           
##  [73] tmap_3.3-4                R.oo_1.25.0              
##  [75] glue_1.6.2                rhdf5filters_1.14.1      
##  [77] grid_4.3.0                sf_1.0-14                
##  [79] generics_0.1.3            gtable_0.3.4             
##  [81] R.methodsS3_1.8.2         class_7.3-22             
##  [83] tidyr_1.3.0               data.table_1.14.8        
##  [85] sp_2.1-1                  BiocSingular_1.18.0      
##  [87] ScaledMatrix_1.10.0       car_3.1-2                
##  [89] utf8_1.2.4                XVector_0.42.0           
##  [91] ggrepel_0.9.4             pillar_1.9.0             
##  [93] limma_3.58.1              robustbase_0.99-3        
##  [95] dplyr_1.1.3               lattice_0.22-5           
##  [97] bit_4.0.5                 tidyselect_1.2.0         
##  [99] locfit_1.5-9.8            scuttle_1.12.0           
## [101] sfheaders_0.4.4           knitr_1.45               
## [103] gridExtra_2.3             bookdown_0.36            
## [105] edgeR_4.0.1               xfun_0.41                
## [107] statmod_1.5.0             DropletUtils_1.22.0      
## [109] DEoptimR_1.1-3            yaml_2.3.7               
## [111] evaluate_0.23             codetools_0.2-19         
## [113] tibble_3.2.1              BiocManager_1.30.22      
## [115] cli_3.6.2                 arrow_16.1.0.100000450   
## [117] munsell_0.5.0             jquerylib_0.1.4          
## [119] dichromat_2.0-0.1         Rcpp_1.0.11              
## [121] png_0.1-8                 XML_3.99-0.15            
## [123] parallel_4.3.0            ggplot2_3.4.4            
## [125] assertthat_0.2.1          sparseMatrixStats_1.14.0 
## [127] bitops_1.0-7              viridisLite_0.4.2        
## [129] scales_1.3.0              e1071_1.7-13             
## [131] purrr_1.0.2               crayon_1.5.2             
## [133] rlang_1.1.2               cowplot_1.1.1